How to Safely Perform Backfill Operations in TimescaleDB

Recent versions of TimescaleDB have added much faster backfill operations (insert / upsert / update / delete) directly on compressed chunks, avoiding the need for the more manual procedures as discussed in this post. Particularly as of TimescaleDB 2.20, such manual procedures are no longer recommended.

Backfilling data into a TimescaleDB hypertable in production can be very tricky, especially when automated processes like compression policies are involved. From past experience, we have seen that if backfill operations aren’t handled properly, they can interfere with these automated tasks, sometimes causing them to stop working altogether.

This blog covers a safer and more reliable approach to backfilling hypertables, along with best practices to prevent disruptions to compression and other background processes.

What is a Backfill Operation?

Backfilling means adding old or missing data into the database table after some time has already passed.

Imagine you are collecting temperature readings every hour, but your system was down for a day and didn’t save any data. Later, you get that missing data from the local storage of the device or cloud storage, and want to put it back in the right hypertable, which is called backfilling.

In TimescaleDB, this is common with time-series data, but it needs to be done carefully. That’s because TimescaleDB might already be doing things in the background, like compressing old data to save space. If we are not careful, backfilling can mess up these automatic tasks.

Prerequisites Before Performing a Backfill

Here are a few important steps to follow for performing a smooth backfill on production and to ensure we are prepared in case something goes wrong.

Never Experiment on Production

It’s always best practice to avoid making changes directly in production, even if they seem minor. Every operation should be tested thoroughly in a staging environment first, to ensure it works as expected before being applied to production.

Even if the goal is to fix or investigate a known issue, making untested changes directly in production can still lead to unexpected downtime. Debugging or reproducing problems on a live system is risky and can disrupt availability, even when the change seems safe.

To avoid such risks, it’s important to maintain a staging environment that closely mirrors production. Before performing major operations like a backfill, test them in staging, observe how the system responds, and only proceed with the production deployment once you are confident it will behave as expected.

Any backfill scenario can vary in data size, duration, structure, and its effect on compression. Testing in staging helps surface potential issues early and ensures a safer, more stable rollout in production.

Back Up Production Before Performing a Backfill

This is another critical aspect of performing a safe backfill.

Backups provide the ability to restore the database to a previous state if something goes wrong during the process.

To be extra cautious, it’s highly recommended to visit the backup tool that you are using before starting the backfill. Confirm that recent backups are available and verify that Point-in-Time Recovery (PITR) is possible. This ensures that, in case of any unexpected issues, we can quickly recover the database with minimal disruption.

Select a Time with Low Customer Activity

As a best practice, we recommend selecting a time window with low database or server traffic for any maintenance activities, such as a backfill. This minimizes operational risks, limits the impact to a smaller number of customers if something goes wrong, this provides sufficient time to restore operations before peak hours.

Advanced Customer Notification for Database Maintenance

Once a low-traffic window and date have been selected for the backfill, the next step is to notify customers about the upcoming database maintenance. Inform them that there may be temporary performance degradation and a possibility of minor operational inconsistencies during the process.

If the backfill proceeds as planned on the selected date and time, there should be no operational risks, and the database will continue running smoothly. However, if any issues arise, we will have sufficient time to restore the database to its original state, and customers will have already been informed.

Once all prerequisites are met, we can proceed with the backfill process.

Methods to Perform Backfill

In this section, we will cover three key methods for performing a major backfill operation in a production environment.

Method 1: Timescale Recommended Approach to Perform a Backfill

To perform a large-scale backfill, Timescale recommends the following approach:

Drop the existing compression policy.
Identify the chunks that will be affected by the backfill.
Decompress those specific chunks.
Perform the backfill operation.
Re-enable the compression policy.

Once re-enabled, the policy will automatically compress the chunks that were updated during the backfill.

Please note that this method requires prior knowledge of which chunks will be involved in the backfill. Decompressing these chunks can put significant pressure on disk space, potentially grow rapidly depending on the volume of data. Additionally, this process can be time-consuming.

So the simple downside for this approach is that it can be significantly more expensive in terms of:

Disk space (as it would require decompressing the entire chunks),
Time (due to the overhead of decompressing and recompressing),
and Cost (related to resource usage).

However, if you prefer to follow the approach recommended by Timescale and are comfortable with above above listed trade-offs, you can certainly proceed with it, as it is considered a more secure and trusted method.

We could also consider the following approaches that may better align with our needs.

Method 2: Perform Backfill on Compressed Data

An alternative approach involves the following steps:

Drop the compression policy on the hypertable.
Perform the complete backfill operation.
Re-enable the compression policy.

Once re-enabled, the policy will automatically compress the backfilled chunks during its next run.

While this method may appear simpler and more efficient than the previously recommended approach, there’s an important observation to consider.

Sometimes, it was noticed that some SELECT queries failed while the compression policy was actively compressing the backfilled data. Interestingly, the same queries executed successfully and quickly once the compression process was completed.

If we choose to proceed with this method, it’s advisable to schedule it during periods of low server activity. This will help minimize the number of SELECT queries from customers and reduce the likelihood of errors.

Method 3: Perform Backfill in chunks

This is another alternate approach that can be used to perform a backfill. Here are the suggested steps:

Drop the compression policy temporarily.
Perform the backfill on compressed data in smaller chunks.

For example, if you need to backfill six months of data, don’t do it all at once. Instead, break it into one-month chunks and perform the backfill on alternate days or weeks. This means the backfill will run in multiple rounds.

After each round, re-enable the compression policy so it can compress the newly modified chunks.

Once you are confident that the process worked fine for the first month’s data, repeat the same steps for the remaining months, either the next day or the following week.

This method is only effective if:

You can split the backfill data into smaller subsets.
You have enough time (spread across days or weeks) to complete the backfill.

By using this approach, we reduce disk usage spikes and may also avoid the issue encountered during Method 2 testing in staging. And we will have more control over backfill operations

Conclusion

While the recommendations in Method 1 are directly suggested by Timescale, the approaches outlined in Method 2 and Method 3 are based on our own learnings and experience. Before applying any method in a production environment, it’s important to thoroughly test it in staging. Ensure all necessary validations are complete, and move to production only once you are fully confident in the results.